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Abstract 



Recent experimental and computational evidence suggests that several dynamical prop- 
erties may characterize the operating point of functioning neural networks: critical 
branching, neutral stability, and production of a wide range of firing patterns. We seek 
the simplest setting in which these properties emerge, clarifying their origin and re- 
lationship in random, feedforward networks of McCullochs-Pitts neurons. Two key 
parameters are the thresholds at which neurons fire spikes, and the overall level of feed- 
forward connectivity. When neurons have low thresholds, we show that there is always 
a connectivity for which the properties in question all occur: that is, these networks pre- 
serve overall firing rates from layer to layer and produce broad distributions of activity 
in each layer. This fails to occur, however, when neurons have high thresholds. A key 
tool in explaining this difference is eigenstructure of the resulting mean-field Markov 
chain, as this reveals which activity modes will be preserved from layer to layer. We ex- 
tend our analysis from purely excitatory networks to more complex models that include 
inhibition and "local" noise, and find that both of these features extend the parameter 
ranges over which networks produce the properties of interest. 



1 Introduction 

Many basic questions remain unresolved in understanding how simple features of net- 
work connectivity determine the statistical structure of their outputs. In particular, as 
we vary the average connectivity strength between model neurons, what kinds of tran- 
sitions occur in model dynamics? The first dynamical property we might study at a 
transition is neutral stability of trajectories. Intuitively, it appears that neutral stabil- 
ity could favor signal transmission, because it suggests that input patterns (and their 
noisy perturbations) will retain their original separation in state space, neither diverg- 
ing nor converging towards some fixed attractor (Bertschin ger & Natschlagerj 2004 



Legenstein & Maass 2007 ). The second, allied property that could occur as networks 



transition from weak to strong connectivity is the production of a wide range of out- 
put states - that is, a mix of firing patterns that induce a broad distribution with high 
response entropy. If responses are tallied via total network output, this could require 
statistical correlations of all orders (Amari et al., 2003); thus, higher-order correlations 
are another statistical property of interest at network transitions. Finally, an assay that 
involves all of these properties is the decodability of input patterns based on network 
outputs. 

But how are all of these properties related? Do networks ever exhibit all of them 
simultaneously, and if so, when? Developing the complete picture is a formidable chal- 
lenge; in this paper, we progress by answering these questions in what is probably the 
most tractable class of systems in which they can be studied. These are noisy, feedfor- 
ward networks of thresholding neurons [Nowotny & Huerta ( 2003[ ) . 

Several important prior studies of signal propagation in feedforward networks in- 
form our approach. These suggest that a wide range of network responses fails to occur 
in broad parameter regimes: rather, the only outputs produced are all cells firing or 
being silent simultaneously. This is due to the buildup of correlations among neural 
activity at each layer, even when inputs drive the cells to fire independently in the first 
layer. In particular, for iteratively constructed in vitro feedforward networks, neurons 
displayed a marked tendency to synchronize (Reyes} 2003[ ). Subsequent simulations and 
analyses with thresholding neurons have corroborated these findings, arguing that any 



initial spike count distribution becomes strongly bimodal after a few layers (Nowotny 



& Huerta] 2003[ ). Integrate-and-fire neurons similarly fail to transmit rates without de- 



caying or saturating to a point independent of the input rate, but are able to support 
propagation of highly- synchronized volleys of spikes ( |Litvak et al. 2003). In contrast, 
different studies demonstrate a "critical" regime with broadly distributed output patterns 
and significant higher-order interactions (Beggs & Plenz[ 2003 ; Yu et al. 201 1 1. 

As we will further explore here, the key difference among these studies turns out 
to be the threshold number of excitatory inputs that each cell must receive in order to 



fire (Kumar et al. , 2010). This threshold is low (a single spike) in the work of Beggs 



& Plenz| ( |2003l ) but much higher for |Nowotny & Huerta| ( |2003~] ), |Reyes| ( |2003| ), and 



Litvak et al. (2003). As reviewed in Kumar et al. ( 2010[ ), densely connected feedfor- 



ward networks with synapses that are weak relative to threshold tend to produce more 
synchrony than their sparsely connected counterparts, due to the neurons having more 
shared inputs (Rosenbau m et al.[|2010 ). Thus, as synaptic inputs are weak compared to 
spike thresholds in many biological settings, it may appear that synchrony is inevitable. 
However, noise "local" to each neuron decreases synchrony, and can do so without 
damaging the capacity to transmit signals, at least those defined by firing rates within 
each network layer ( |van Rossum et al.| ( |2002| ), but see |Nowotny & Huerta| ( |2003| ); |Reyes 
( 120031 ). 

Here, we undertake to unify these results through a common mathematical frame- 
work, and extend them by treating multiple assays of network outputs. In particular, we 
show when and how neutral stability, broad response distributions, higher-order corre- 
lations, and the transmission of firing rate signals coexist and when these properties fail 
to coexist. For any level of spike threshold, we find that there is always an intermediate 
value of network connectivity characterized by neutral stability and higher order corre- 
lations. High response entropy and transmission of firing rates, however, only occur at 
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this point when neurons have low thresholds or added noise. 

The narrative of the paper proceeds as follows. Section 2 gives the setup of the 
model. In Section 3, we introduce the branching ratio, describing how layer-averaged 
activity - that is, firing rates - are propagated from layer to layer. Next, in Section 4, 
we develop tools that give a more refined view of how activity is transmitted. Specif- 
ically, we show that the model can be reduced to a mean-field Markov chain, and that 
the eigenstructure of the corresponding transition matrix reveals intermediate parame- 
ter values for which the networks support persistent, broadly distributed responses. In 
Section 5 we study the resulting responses in terms of higher-order correlations and 
response entropy, showing that they are both maximized at this intermediate parameter 
regime. Section 6 introduces a combined metric, which assess the capacity of networks 
to preserve rates from one layer to the next while maintaining broadly distributed re- 
sponses. In Sections 7 and 8, we apply the same analyses to excitatory-inhibitory net- 
works and to those with localized "background" noise, and see that both factors increase 
parameter ranges over which the propagation of broad activity distributions with pre- 
served firing rates occurs. 



2 Model of stochastic feedforward networks 



Network: Following Nowotny & Huerta (2003), we examine a network of binary (Mc 



Culloch & Pitts[ [l943 ) neurons in a feedforward architecture with probabilistic synapses 



and input (see Figure [I^. for a schematic). Each layer consists of iV identical neurons. 
In general, we will illustrate iV = 20; results hold for larger N as well, as we summa- 
rize in the Discussion. The neurons are thresholding units that spike if they receive at 
least 9 synaptic inputs from neurons in the previous layer and are otherwise quiescent 
(i.e., silent). The connectivity structure between layers is random and spatially homoge- 
nous; each neuron upstream is connected to C postsynaptic neurons chosen uniformly 
from the downstream layer. Connections between neurons have a fixed probability p of 
synaptic transmission. 

Stimuli: The networks are driven by a stimulus to elicit an average spike count of 
S G {0, . . . N} firing neurons in the first layer at that time step. Unless otherwise 
specified, this stochastic input is injected independently so that each neuron in the first 
layer responds as an independent (0,1) Bernoulli random variable with biased probabil- 
ity S/N of spiking (taking value 1). This results in a binomially distributed spike count 
in the first layer. 

Propagation: The state of the Lth layer is denoted by x^, an iV-vector of zeros and 
ones, and the connectivity matrix between layers L and L + 1 by El- (Henceforth we 
will use E to refer to the AfxiVxL-1 connectivity matrix of the entire network.) 
Since the connections between neurons are stochastic, in a given trial each synapse fails 
with probability 1 — p. It will be useful to call the realization of E L according to the 
probability of synaptic transmission the "effective" connectivity matrix El, keeping in 
mind that different trials will yield different El yet El will remain fixed. The state at 
layer L of a realization of a given network can now be expressed as 

x L+1 = Q{E L x L - 0), 
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Figure 1: Average rate transmission in feedforward networks. (A) A schematic of a 
feedforward network. Filled circles indicate spikes, hollow circles quiescence (i.e., 
absence of firing). In this example, N = 4, L = 5, C = 2, p = 0.5, and 9=1. (B, C) 
Branching ratio a as a function of connectivity strength 7 for N = 20, (B) 9 = 1 and (C) 
9 = 7. Each data point is the branching ratio of a network of a particular connectivity 
structure. (D - F) Simulated propagation of firing rates shown for three sample networks 
with 9 = 1 and C = 3, p = 0.25, 0.43, 0.85, respectively. These parameters are 
also indicated by the markers in (B). Noisy, uncorrelated input is injected into the first 
layer, and the resulting firing rates are averaged over 1000 trials plotted over multiple 
layers. Error bars indicate standard deviation scaled by a factor of 1/10 to facilitate 
comparison. Vertical grey bars are shown at L = 5 to emphasize that henceforth we 
will primarily be concerned with shallow layers. (D) In subcritical networks (er < 1), 
activity tends to die out. (F) In supercritical (a > 1) networks, activity saturates. (E) 
Critical networks (a ps 1) reveal greatest fidelity in propagating Poisson input rates 
through layers; however, while this picture is qualitatively true for networks of low- 
threshold neurons, when 9 reaches higher values, networks tend to transmit only high 
or low rates (see Section 6). 

where is the elementwise Heaviside step function. The key parameter in this system 
is the connectivity strength 7 = Cp. 

Limitations and simplifications: We note several important facts about the model setup 
and analysis. First, this model has no time in its dynamics; each trial can be thought 
of as a wave of activity evolving from a particular initialization in the first layer, and 
is independent of the next. Because of this, the phenomenon of synchrony in the usual 
temporal sense is not applicable. The corresponding concept of synchrony is when 
neurons in a layer tend to fire, or be quiescent, together in a given trial; this is what we 
will mean in the remainder of this paper when we refer to synchrony or synchronous 
coding. Second, because of the assumption of spatial homogeneity both in inputs and 
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in network architecture, this model is not well-suited to study spatial modes of activity. 

Third, and most importantly, our analysis henceforth focuses on the total activity 
within each layer. That is, rather than quantify network responses in the full space of 
2 N firing patterns that can occur in each layer, we restrict our description to the number 
of cells that fire in that layer: the N + 1 different values of the (layer-averaged) firing 
rate. 



3 The branching ratio 

To understand the qualitative dynamics and average firing rate transmission through 
multiple layers, we borrow a useful tool from the criticality literature ( |Beggs & Plenz[ 



2003). A "critical" transition regime is often experimentally defined via the branching 
ratio a, the ratio between the number of cells in a population firing at a particular time 
step and the number of cells firing at the previous timestep, averaged over time. To 
avoid decay or growth of activity, the system must produce firing rate dynamics which 
are neutrally stable, satisfying such networks are labeled critical. 

In our feedforward framework, the relevant measurement is the branching ratio av- 
eraged over trials and layers rather than time. To quantify the general capacity of a 
particular network with fixed connectivity structure E to maintain activity in a one-to- 
one manner, we will also average this layerwise branching ratio over repeated trials 
with the same network, each with different stimulus rates as well as different (random) 
inputs xx to the first layer. The result is: 

S L 

a " 




Sl-i 



where Sl is the number of neurons spiking in layer L on a given trial. Throughout this 
paper, when we refer to the branching ratio we will mean a. 

We conducted Monte Carlo simulations to compute how this quantity changes with 
connectivity level 7. In detail, at a fixed 7, we first chose one example of a network 
structure E for every C > [7] , the constrained value ensuring that p < 1, For each E, 
we then input a deterministic rate of exactly S — 9 + 1, . . . , N spiking neurons in the 
first layer with 100 random instantiations of xi, evolve the network, and measure the 
ratios Sl/Sl-x for each layer until either the neural activity dies out or until the last 
layer is reached. Finally, a is computed as the average over the 100 random network 
realizations and instantiations at the first layer, and subsequently over all stimulus levels 
greater than 9 spiking neurons per layer (as any input less than that is guaranteed to be 
extinguished at the next layer.) 

Figure[ljBC shows results over five layers. Each of the tight scatter of points at each 
value of connectivity 7 is the branching ratio of a particular network with that value of 
connectivity and a specific architecture E. (The fact that there is very little variation at 
a given level 7 supports our choice of this combined parameter as the principal one in 
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our studyjj) Note that as we sweep connectivity 7 from small to large values, we pass 
through a value 7 b s at which a ~ 1. Thus, we find that the transition (critical) branching 
parameter is consistently found in our networks at some intermediate connectivity level. 

We next illustrate the implications of the branching parameter for propagation of 
firing rates across network layers. For many different networks, we compute rate tra- 
jectories averaged over 1000 trials for input rates ranging from to N = 20 neurons 
firing in the first layer. In each trial, the input rate S and E are fixed, yet xi and E 
change probabilistically. The evolution of the firing rate over 20 layers is shown for 
three representative networks with threshold 9 = 1 in Figure [lp-F. In subcritical net- 
works (Figure [Tp) neural activity dies after a few layers regardless of stimulus. The 
supercritical, i.e. o > 1, network (Figure [TJi) inflates rates to nearly maximal values, 
and as in the subcritical case it is difficult to distinguish between two inputs based on 
output rate alone. In critical networks, however, rate trajectories remain separated at 
each layer (Figure [Tp). This result is in agreement with other findings in the literature 
regarding information transmission of critical networks. Overall, these simulations con- 
firm the expected picture that the average firing rate statistic is best propagated through 
networks when a ~ 1, 

Beyond preservation of firing rates from one layer to the next, we are interested in 
networks that can produce a broad distribution of responses, and avoiding the limita- 
tions of strong synchrony. To assess this, in the next section we introduce a tool to 
describe propagation of firing rate distributions across layers via a mean-field approxi- 
mation. 



4 Mean-field Markov chain model and spectral analysis 

Since the state of each layer depends solely on the state of the previous layer and 
the synaptic connections between layers, our feedforward networks are Markov chains 



(Nowotny & Huerta, 2003). Furthermore, as we aim to describe only the propagation 
of layer- averaged firing rates, rather than particular firing patterns (or binary "words"), 
our Markov chain has N + 1 states. We proceed to derive a mean-field description of 
the Markov chain for each connectivity level 7 by averaging over possible connection 
matrices. This mean-field model becomes exact in the special case of all-to-all connec- 



tivity (C = 1), for which Nowotny & Huerta ( 2003| ) developed the same description. 



For the excitatory networks considered in the main part of the paper (Sections 1-6), we 
have verified numerically that the mean-field model is a good predictor of the true spike 
count dynamics except in the deterministic limit of large p and small C (see Appendix). 

The mean-field transition matrix A - i.e., the matrix whose (n, m)th element is the 
probability that there are S L = m neurons spiking at a layer given S L = n spiking in 



In more detail: in following sections we reduce the two parameters C and p dic- 
tating network connectivity to the single connectivity parameter 7; this is supported by 
the observation that variation in a for fixed 7 but varying values of C and p is has a 
negligible impact on the branching ratio, as shown by the tight scatter of points at each 
7 in Figure [TJ3C. Moreover, in the mean-field theory we develop, it is also the case that 
7, rather than C and p separately, enters. 
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the previous layer - is given by: 



A nm ={ N ) q ™(l-q n ) N - 



m 



if n > 9. Here, q n is the probability that a downstream neuron will fire assuming n 
spiking neurons in the previous layer: 



•fl \ / ry \ k / ry \ n — k 



kj \NJ V 1 N 



lfn<9, then q n < and A nm = 0. The mean-field transition matrix A can be derived 
from the transition matrix of the original Markov chain (see Appendix for details). 

Using the transition matrix, the spike count probability distribution P L at layer L 
(the vector of length N + 1 whose jth component is the probability that j neurons are 
firing in layer L) is simply given by matrix-vector multiplication: Pl = Pl-\A. This 
averaged system is inherently permutation-symmetric due to a lack of spatial structure 
in the network connectivity. 

The long-term behavior of these feedforward networks can be predicted using the 
eigenvalues and eigenvectors of the mean-field transition matrix. To illustrate this, as- 
sume A is diagonalizable, so that the input probability distribution Pi npu t = P\ can be 
decomposed into a linear combination of the eigenvectors of A: 

-Pinput = ao^o + ot\V\ H h a N v N . 

The spike count probability distribution at the Lth layer is simply P L = Pi nput A L : 

P L = a X L v + aiAf^i H h a N X^v N . 

The persistence of different eigenmodes through layers is determined by the size of 
their corresponding eigenvalues If Aj <C 1 then after a few layers the contribution 
ith eigenmode will decay to become negligible. On the other hand, eigenmodes whose 
eigenvalues are near 1 will survive through many layers. 

We analyze the eigenstructure of A through a combination of mathematical analysis 
and computational argument. First, it can be proven that A has one unique stationary 
state corresponding to all neurons being quiescent: v a f{ = (1, 0, . . . , 0) (Proposition 1 in 
Appendix). Second, if A is well-behaved in the sense that its eigenvectors have limits as 
7 — > N (an assumption that is supported by numerics, see Figure[2j^C), then the second 
largest eigenvalue A* of A converges to 1 as 7 -» N, indicating the emergence of an 



In Markov chains with very non-normal transition matrices, transient activity can 
persist past that expected solely by the spectrum; these matrices can be analyzed 
through their prominent pseudospectral sets, which are the eigenvalues of small per- 
turbations of the matrix. When we pursued this type of analysis of A, we did not find 
significant pseudospectral sets that described the persistent activity of our networks 



beyond expectations from the spectral analysis (results not shown; see Trefethen & 



Embree (2005 ) for more details on pseudospectral analysis). 
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Figure 2: Spectral analysis of the mean transition matrix for networks with (A, B) 9 = 1 
and (C, D) 9 = 7. (A, C) The second largest eigenvalue A* (solid line) effectively con- 
verges to 1 while the angle between v* and the vector corresponding to full synchrony 
v on (dashed line) maintains significant value for a range of 7, indicating that the second 
eigenmode is both persistent and far from bimodal. This is also illustrated by the in- 
sets, which show typical histograms on the line quasi-attractor either for an intermediate 
value of 7 (circle markers on dashed line in A, C and on the second dominant eigenvalue 
in B, D) or when 7 is too high to support broadly distributed eigenmodes, resulting in 
bimodal distributions (triangular marker in A, B). (B, D) Also at the emergence of the 
line quasi-attractor (circle markers), all eigenvalues are near-maximal compared to their 
values over the entire connectivity range. 



additional dimension of persistent activity. The catch, however, is that the correspond- 
ing eigenvector v* converges to a vector in the subspace of bimodal or synchronous 
distributions - that is, to the span of v k and t> on where t> on = (0, 0, 1) corresponds 
to all neurons firing (Proposition 2 in Appendix). All other eigenmodes must converge 
to as 7 — > N. So despite the emergence of this extra persistent dimension, activity 
becomes synchronous as connectivity strength increases. Ideally, what we want is for 
A* to be practically 1 yet for the span of v* and v g to be far from the plane of bimodal 
distributions. 

Intriguingly, numerical calculations reveal that this does occur for an intermediate 
level of connectivity 7 e i g (Figure [2j\C), implying the emergence of a plane spanned by 
the two principal eigenmodes v* and y ot f that, due to increased persistence, effectively 
acts as an attractor in sufficiently shallow layers: because of this, we will call this plane 
quasi-attracting. Once the vectors are normalized to represent probability distributions, 
this means that at 7 eig , there exists a line quasi-attractor that is far from the space of 
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Figure 3: Geometrical interpretation of spike count propagation for N = 2, 6 = 1. The 
spike count histogram evolves through layers according to the mean-field model via 
matrix multiplication, affecting rotation in the space of spike count probability distribu- 
tions. (A) For low connectivity strengths, the input distribution quickly converges to the 
plane spanned by the first two eigenmodes (the rainbow plane, although histograms will 
be constrained to the line embedded in the plane satisfying J2i Pi = 1) as me network 
encodes the signal. The distribution then slowly converges to the true stationary state 
v ff, and the signal decays. In this example convergence to quiescence occurs in only a 
few layers. (B) For high connectivity levels, activity persists through deeper layers, but 
the line quasi-attractor has rotated closer to the space of bimodal distributions spanned 
by v on and v &. The ideal network lies between these two figures. See text for a more 
detailed discussion. 

bimodal distributions, and hence that the network can support broadly distributed, in- 
completely synchronized firing states. At this same intermediate 7 e ; g , we also observe 
significant values of all eigenmodes (Figure [2jBD), showing further persistent activity 
contributed by other eigenmodes, at least for the initial network layers. 

We pause to give a geometrical view of the mean-field dynamics described above. 
This is illustrated in Figure [3] for N = 2, although the following description holds for 
arbitrary N. Consider the (N + 1) -dimensional space of the spike count probability 
distribution at a layer. Starting with any input probability vector Pi nput , the layer-to- 
layer mean-field dynamics of the network can be visualized as iterated rotations of the 
input vector Pi nput in the space of spike-count distributions, constrained to the simplex 
^\ \Pi\ = 1. In Figure [5J repeated applications of A are enumerated. In the first 
few iterations, the spike count distribution converges towards the line quasi-attractor 
spanned by v ff and v* as smaller eigenmodes decay. This represents rapid encoding 
of the input distribution. Eventually, the system drifts to the stationary state where all 
neurons are silent, t> D ff. This of course represents a final state in which the network 
has "forgotten" the input. If 7 < 7 eig , then the convergence to v ofi - happens within a 
few layers (as in Figure [3}\). When 7 > 7 eig , although activity persists through many 
layers as expected, the line quasi-attractor has rotated nearer to the span of v on and 
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v ff, so that the persistent activity is nearly synchronous (see Figure |3jB). It is only 
when 7 7 eig that activity is persistent while resisting synchrony. In this sense, 7 e i g 
represents the existence of a persistent mode of activity characterized by a balance of 
principal eigenmodes that are broadly distributed, avoiding firing patterns being limited 
to synchrony or quiescence. In fact, as we will explore in the following section, 7 « 7 eig 
also predicts further interesting statistical features of network responses. 



5 Statistical structure of network responses 



Next we will quantify the statistical features of the network responses over the range of 
connectivity strengths and threshold levels. First, when do responses show that neurons 
in a given layer fire with higher-order correlations - that is, in a way that cannot be 
predicted from their pairwise spike correlations alone? Beyond their basic role in char- 



acterizing the degree of coordinated spiking in networks (Shl ens et al.[ |2006, Sc hneid- 
man et al.j |2006| [Martign on et al.[ |2000| |Staude et~aL| |2010 ), higher-order correlations 
have been shown to be necessary to produce broad distributions spiking activity (|Amari 
et al.[|2003[ ) (for recent applications, see |Macke et al.| ( |2011| ); |Yu et al.| ( |201ip ), and to 



significantly impact coding of stimuli (Ganmor et al. 2011 Montani et al. , 2009). 

To calculate the extent of higher-order correlations, we utilize maximum entropy 
models (Shlens et al. 2006, Schneidman et al. 2006; Jaynes[ 1957). The pairwise 
maximum entropy fit of a probability distribution is defined as the distribution that has 
maximal entropy while being constrained to match the first and second moments of the 
true distribution. Thus, this fit makes the fewest additional assumptions on the structure 
of the probability distribution - any additional structure is attributed to higher-order 
correlations. For the permutation- symmetric networks at hand, the pairwise maximum 
entropy distribution is given by: 



jME, 



n 



exp{A]n + A 2 ^ 2 } 



where Z is a normalizing factor and Ai, A2 are the parameters chosen to match the first 
two moments. To quantify the effect of higher-order correlations present in the system, 
we compute the stimulus-averaged Jensen-Shannon (JS) divergence between the true 
distribution and its maximum entropy fit: 



N 




m=0 




\{m) + Pl IE < 



m) 



This quantity assumes values < D JS (P L , P^ ) < 1; it can be thought of as a sym- 
metrized version of the Kullback-Leibler divergence. 

Recall that each neuron in the first layer is independently stimulated so that their fir- 
ing is a Bernoulli trial, so no correlations are injected into the network. All correlations, 
pairwise and higher-order alike, emerge solely from the network interactions. We com- 
puted the JS divergence between spike count distributions distributions at layer 5 and 
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Figure 4: Statistical features of network responses for (A, B) 6 = 1 and (C, D) 6 = 
7. (A, C) Response entropy (dashed grey line), and stimulus-averaged higher-order 
correlations (solid black line) plotted as a function of 7. Also shown are 7 e ; g (hollow 
arrow below panel) and 7 obs (solid arrow). (A) When 9 = 1, peaks in both curves 
line up with 7 eig , as does 7 obs (arrows offset for visibility). (B) For higher threshold 
networks, 7 ODS doesn't align with 7 e i g or other assays (see Section 6). (C, D) Higher- 
order correlations of spike count histograms along the line quasi-attractor (solid lines; 
axis parameterizes distributions along the line quasi-attractor starting at t> ff). Compare 
with higher-correlations averaged over the entire space of histograms (dashed lines). 

their pairwise maximum entropy conditioned on input rate, then average this over all 
possible stimuli. Through this assessment, we note significant higher-order correlations 
already by the fifth layer at 7 e ; g (Figure |4j\B, solid line). 

How can we understand how these higher-order correlations arise? We next show 
that they can be predicted from the spectral analysis of the previous section, without 
the need for simulation. Figure |4pD plots the JS divergence between the spike count 
histograms on the line quasi-attractor and their pairwise maximum entropy fit. Here, 
we plot this quantity as a function of their position along the line, parameterized so that 
v ff is at position 0. This can be compared to an average JS divergence of approximately 
0.08 (dashed lines; calculated by averaging over 10,000 random sample distributions so 
that the mean had converged) over the entire space of possible response histograms. In 
particular, the eigenvectors at 7 e i g produce significantly larger higher-order correlations 
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than the average. This is because at this level of connectivity, the response distribution is 
a mixture of two distributions: a large component of quiescent neurons corresponding to 
t> ff, and a broader component corresponding to the contribution of v*. As 9 increases, 
the level of higher-order correlations decreases on the line quasi-attractor, as higher 
thresholds reduce the breadth of v*. 

The second statistical feature of note is the response entropy of the spike count 
distribution: 

N 

H(P(S L )) = ^P(S L = n)log 2 P(S L = n). 

n=0 

Larger response entropies indicate broader response distributions. The response entropy 
at the 5th layer peaks at 7 e i g , indicating that the emergence of the line quasi-attractor cor- 
responds to the broadest distribution of activity across all values of 7 (Figure |4} dashed 
grey line). However, the peak response entropy decreases for higher values of 9; this is 
the result of the fact that as 9 increases, v* produces less broad response distributions 
due to the high threshold and hence the silencing of weak inputs, preventing them from 
eliciting any firing in the subsequent layer. 

In sum, we have shown that at 7 eig , networks display maximal response entropy and 
significant contributions from higher-order correlations, directly because of the contri- 
bution of other eigenmodes at that level of connectivity. 



6 Combining neutral stability and broad response dis- 
tributions 

In order to maintain averaged levels of activity without succumbing to synchrony, a 
network must simultaneously satisfy two criteria. The first is that it must be able to 
preserve averaged firing rates from layer to layer without succumbing either to runaway 
excitation and maximal firing rates in deeper layers, or to decaying network activity. 
Second, a network must exhibit a broad spike count distribution at each layer in order 



to prevent the buildup of correlations and synchrony (Kumar et al.l 2010, Reyes, 2003 



Litvak et aL} |2003| ). We refer to these properties, taken together, as asynchronous rate 
coding. 

For which parameter regimes can such asynchronous rate coding occur? To quantify 
this we need an assay that captures how well networks are able to propagate broad 
response distributions from one layer to the next. We base this on the propagation of 
binomial spike count distributions, as these correspond to fully independent activity 
in each layer. Specifically, we define the spike count JS divergence D to be the JS 
divergence between the binomial input distribution Pi and the Lth layer spike count 
distribution P L averaged over all stimuli S: 

1 - 

Q ) = ]vTT E D Js(P(S L \S = n), P(5i|5 = n)). (1) 

n=0 

Networks will exhibit good performance, measured by low values D, when they main- 
tain the broad (independent) spike count distribution and the averaged firing rate that 
occurs in the first layer. 
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Figure 5: Spike count JS divergence plotted as a function of 7 for increasing 9. Optimal 
performance for each threshold (the minimum value of the curve) occurs near 7 eig . 

Plots of the spike count JS divergence over 7 are shown in Figure [5] for increasing 
values of the spike threshold 9. For each fixed 9, there is an optimal, intermediate 
value of 7 at which networks are best able to satisfy both of our criteria. However, as 
threshold level increases, the best value of the spike count JS divergence also increases, 
showing that high-threshold networks fail to produce asynchronous rate coding. 

This failure follows from the requirements of neutral dynamics and broad response 
distributions described in previous sections. First, from Section 3, 7 G b s captures the first 
criterion of complex signal coding outlined above: that is, networks demonstrate neu- 
tral stability and average one-to-one rate transmission when they average a branching 
ratio of a ~ 1. On the other hand, Section 4 shows that 7 eig reflects when the network 
supports persistent, broad response distributions, providing an assay of the second cri- 
terion. Complex signal propagation can therefore only occur in these systems when 
7obs ~ 7ei g - Comparing Figure [2] with the previous Monte Carlo simulations in Fig- 
ure [TjBC reveals both criteria can indeed be simultaneously satisfied when few inputs 
are required to cause a spike, but a gap between these required values of connectivity 7 
appears with increasing 9. To be precise, for N = 20, 9 = 1, Monte Carlo simulations 
and spectral analysis both yield 7 t, s ~ 7 e i g ~ 1.3. When 9 = 7, however, simula- 
tions show 7 b s ~ 13.75 yet 7 e ; g « 10.5. As shown through the eigenstructure, at 7 G b s , 
only bimodal activity is supported after a few layers. In fact, because of the inevitable 
synchrony in deep layers, optimal performance under the JS divergence tends to fall 
nearer to 7 e i g than to 7 b s - Networks of high-threshold neurons are therefore unable to 
simultaneously satisfy both requirements of complex signal propagation outlined at the 
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beginning of this section. 

Intuitively, the reason that 7 obs > 7 e i g is that networks with high-threshold neurons 
reject inputs of low firing rate, so that when 9 is large there is an increased likelihood 
that connectivity structure and stochasticity will conspire to silence all activity in the 
next layer. Geometrically speaking, as 9 increases so does the nullity of A, resulting in 
a larger and larger subspace that trajectories must avoid lest they risk susceptibility to 
network quiescence; in order to reach an average of one-to-one rate transmission, it is 
necessary to provide a buffer for the coding subspace from the nullspace by inflating 
the connectivity into the regime of bimodality. 

Also of practical importance is the question of robustness to parameters. The deli- 
cate nature of 7 e i g and 7 obs constrains networks that produce asynchronous rate coding 
to finely-tuned connectivity strengths; one requires that the branching ratio lie at a crit- 
ical value a rs 1, while the other relies on a precise balance between persistent yet 
broadly supported eigenmodes. This sensitivity is reflected in the sharp troughs in the 
JS divergence (Figure[5]); for even higher N, these troughs become even sharper and ro- 
bustness is a more important goal to obtain. As we will see in the next section, however, 
this sensitivity can be mitigated by adding an inhibitory population to each layer. 

To summarize results thus far, we evaluate networks on two criteria: a ~ 1 and 
broad response distributions. Low-threshold networks can always satisfy broad re- 
sponse distributions and maintained average rate transmission at the same 7. High- 
threshold networks are able to somewhat support broad distributions, although the pre- 
served aspects of network responses and their lower values of response entropy indicate 
less broad distributions as compared to their low-threshold counterparts. They also can 
satisfy a « 1, however this is due to averaging: because of the increasing nullity of 
the mean transition matrix, these networks cannot propagate weak input stimuli, so 
they must overcompensate by inflating 7. Because of this, no high-threshold network 
of a fixed 7 can simultaneously both criteria, and hence they cannot propagate rates 
asynchronously through layers. This appears to be a significant limitation for high- 
threshold networks - and, importantly, for many biological neural networks, in which 
many inputs are required to elicit a spike. In the following sections we will incorporate 
additional biophysical features - inhibition and noise - and study whether this provides 
a resolution so that high-threshold networks can support persistent, broadly distributed 
activity. 

7 Excitatory-inhibitory networks display increased ro- 
bustness 

How can asynchronous rate propagation emerge in high-threshold networks? Intu- 
itively, we might expect an added inhibitory population to prevent runaway excitation 
and saturation of firing rates to high values, thus preventing synchrony. To test this, we 
added an inhibitory population of Nj neurons to each layer of Ne = N — Nj excitatory 
neurons, and further impose N E — Nj > 9 (otherwise no activity could be transmitted 
due to the homogeneity in network connectivity - even if only the excitatory population 
is active in layer L, the random connectivity imposed will cause the same proportion 



14 



e = 7 




connectivity y connectivity y 



Figure 6: Excitatory-inhibitory networks display increased robustness, N E = 16 and 
Ni = 4. (A) The second largest eigenvalue A* and the angle between v* and v on (dashed 
line) overlap over a larger parameter space, indicating robustness at 7 e i g ; (B) similarly, 
all eigenvalues A of have broader peaks. Inset shows a typical broadly distributed his- 
togram at 7 eig (indicated by the marker in A, B). (C) The spike count JS divergence has 
a wider minimum for all values of 9, showing that inhbition also allows for more robust 
asynchronous rate propagation. 



of the excitatory and inhibitory populations in layer L + 1 to fire). Network parame- 
ters are assumed to be homogenous among the inhibitory and excitatory populations. 
Because of this assumption, it is straightforward to calculate the new, four-dimensional 
mean-field transition matrix Ai n : 

P(Sf = m E , Si =m I \Sf_ l = n E , Sf_ 1 = m) = 

l^ E \n m E (-\ — n \N E -m E ( Ni \ mi (i _ \N z -mi 
\m E j n V n E, n i) A ImJ B,nf l lriE,n I ) j 

where S l L is the number of cells spiking in the excitatory (i = E) or inhibitory (i = I) 
population at the Lth layer, and q nE , ni is the probability that a downstream neuron 
spikes given n E spiking excitatory neurons and nj spiking inhibitory neurons in the 
upstream layer: 
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The binomial input distributions now take the following form: 



N E \(n\ m E(^ n \ N E-m E / 7V 7 \ / n \ m ' n\ N i- m i 



P(Sf = m E , S[ = mj\S = n 

K m E J \NJ V NJ X [mi) \n) V NJ 

The expression for the transition matrix for the excitatory-inhibitory networks has 
a similar form to that of the purely excitatory networks, so the eigenstructure of A\ n is 
similar to that of A: it has a unique stationary state corresponding to all neurons being 
quiescent, and as 7 — > N, the second largest eigenvalue converges to 1 and its eigen- 
vector corresponds to bimodality (Proposition 3 in Appendix). There also is an inter- 
mediate state of connectivity at which A* ~ 1 and v* is far from bimodal (Figure |6]AB). 
Here we consider 9 = 7, as well as N E = 16, iVj = 4 to simulate ^20% inhibiti on 



branching 



as typically used in, for example, cortical modeling (cf. Braitenberg & Schtiz 
This yields 7 e j g ~ 16.9. However, according to Monte Carlo simulations, the 
ratio is always less than 1 for all 7 < N. Firing rates thus fail to be maintained in 
this network, as reflected in the spike count JS divergence in Figure [6p. The reason is 
that A m is so structurally similar to A: as in the purely excitatory networks, the high 
threshold still rejects weak inputs and sends them to the stationary state of quiescence, 



v a ff. This is in agreement with Reyes ( |2003 ), who found that adding a homogenous 



inhibitory population to each layer does not help networks avoid synchrony. 

Inhibition does, however, increase the robustness of JS divergence to perturbations 
in connectivity strength 7. Specifically, the troughs of minimal JS divergence widen 
compared to those of purely excitatory networks (Figure[6]C). This is reflected as well in 
the eigenstructure: the intermediate state of persistent, broadly distributed distributions 
is now stretched to cover a wider range of 7 (Figure |6|\B). This robustness further grows 
as the size of the inhibitory population is increased, so long as N E — Nj > 9 (results not 
shown). Intuition for this effect can be obtained by comparing to the purely excitatory 
case. Suppose a typical neuron in this case has n inputs. To produce a broad range of 
responses and avoid either too many inputs (resulting in maximal firing rates) or too 
few (resulting in quiescence), n must hover near some critical value that depends on 
particular choice of parameters. Now consider an excitatory-inhibitory network: then 
the typical neuron has (1 — Nj/N E )n net inputs when taking into account inhibition. 
Since 1 — Nj/N E < 1, this slope is shallower than that for purely excitatory networks, 
so the networks are more robust to chance perturbations around the critical value of 
inputs, and hence to connectivity strength. 

Increased robustness to connectivity parameters in the presence of inhibition is inter- 
esting as it addresses a major concern regarding the plausibility of dynamics at critical 
transition values of connectivity (as discussed in the previous section). In sum, inhibi- 
tion may help resolve the need for fine-tuning by enhancing robustness to fluctuations 
in network connectivity. 

J We emphasize that results in this section are not particular to these specific choices 
of Nj and N E . As long as N E — Nj > 9, the intermediate 7 e i g producing broad, 
persistent distributions will continue to exist. Other results regarding robustness, and 
limitations on asynchronous rate propagation high 9, also continue to hold. 
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8 Impact of background noise 



The next attempt to recover asynchronous rate propagation follows from |van Rossum 
et al.| ( |2002[ ), in which a noisy background current was shown to enhance the preser- 
vation of firing rates in feedforward networks of integrate-and-fire neurons; see also 
Nowotny & Huerta| ( [2003] ), |Reyes| ( |2003] ), and |Litvak et al.] ( |2003| ). We inject back- 
ground noise in the form of independent, zero-mean Gaussian independent noise cur- 
rent x to eacn neuron, \ ~ -^"(0, & x )- This transforms the heaviside-like thresholding 
into a smoother, sigmoidal operation. The probability that a neuron will spike given n 
cells firing in the upstream layer is now 



q n 



Pr(I > 9 — x)Pr(x = x)dx, 



where / is the synaptic input from the upstream layer without the additional noise com- 
ponent. If x > 9 then the neuron fires with probability 1 because the noise alone is 
enough to elicit a spike. If x < 9 — n, then the neuron can never fire as even the ad- 
dition of all upstream neurons delivering input would be insufficient to cross threshold. 
We can then rewrite q n as: 



1 



6-n 



2,71(7 -, 



exp 



X 



2a{ 



E 

= 10-3 



jL 
N 



dx 



+ 



27T0-, 



cxp 



dx. 



Nowotny & Huerta (2003) consider a similar expression (Equation 7 in their paper), 
although both the exact form of their expression, and their conclusion that it has little 
effect on the transition matrix, differ from ours. We denote by A no i sy the transition 
matrix describing these networks with added noise, generated by the new q n . 

Figure [7] plots the spike count JS divergence (Equation 1) as a function of 9 and 
a x . The main result is that adding noise produces lower values of JS divergence - and 
thus more consistent propagation of asynchronous inputs - at larger values of 9. This 
result agrees with the findings of van Rossum et al.| ( 2002| ) (cf. their Figure 2B and see 
Appendix for further comparisons with this study). Our result is also in agreement with 
Reyes ( |2003 ), who finds that adding white noise as a background current reduces the 
amount of synchrony present in networks. 

For each of the threshold values 9 shown, there is an optimal o x for asynchronous 
rate propagation (i.e., that minimizes the JS divergence). This amount of noise gives 
spontaneous firing rates of less than 12%, as measured by the probability Pr(x > 9). 
For the remainder of this section, we will take the optimal value of noise for each 
value of threshold. Figure [8p uses these values to provide another view of optimized 
JS divergence, which reveals the improvement in comparison with noise-free networks 
(Figure [5]). Moreover, 7 obs and 7 e j g do coincide in the noisy case, even for high values 
of 9 (at about 14.25 in for 9 = 7, branching ratio figure not shown). 

In contrast to the effects of inhibition, the addition of background noise does pro- 
duce substantial changes in the structure of the transition matrix. For example, compar- 
ing v4 noisy with A, spontaneous activity is now possible, as v oS is no longer the stationary 
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Figure 7: Impact of noise on input propagation. Surface shows spike count JS diver- 
gence as a function of 9 and the standard deviation of noise added to each neuron, a x . 
For each 9, there is a a x that optimizes asynchronous rate propagation. For 9 < 10 the 
relationship between 9 and optimal o x is linear. 

state. Instead, the stationary state v ss is now a function of 7. In particular, the noise 
contributes a nonzero probability from transitioning from any state to any other state, 
so the components of Awisy are strictly positive. By the Perron-Frobenius theorem, this 
means the system has a unique stationary state v ss whose components are all strictly 
positive (so it can never be v on or v fi). Computationally we find that the second largest 
eigenvalue now does not converge to 1 as 7 — >■ N; it does, however, attain a peak value 
near 1 at an intermediate 7 e i g , and at this point v$s and t> ff are also far from bimodal 
(Figure [SJ^B). Thus despite the differences in eigenstructure between A and A noif . y , the 
predominant features that define the existence of a persistent set of broad firing distri- 
butions are still apparent: there is an intermediate connectivity level 7 e i g at which all 
eigenvalues are significant, the second largest eigenvalue in particular is close to 1, and 
both the stationary distribution and the second eigenmode are far from bimodal. 

Finally, to put the role of noise to a more demanding test, we test its impact on 
the capacity of networks to discriminate between different input stimuli. For this, we 
calculate the rate discriminability by measuring the error rate given by the maximum 
likelihood estimator on T trials. Specifically, suppose the network produces output 
spike counts S^, . . . , £f[ under some fixed input stimulus level S. Since the trials are 
independent, the maximum likelihood estimator (MLE) chooses between two stimuli S 
and S' by selecting the one that is likelier to result in the given data, via the likelihood 
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Figure 8: Properties of noisy networks at optimal background noise levels. (A) The 
second largest eigenvalue A* peaks very close to 1 at an intermediate 7 e i g . The angle 
between v* and v on (dashed line) and between v ss and v 0B (dotted line) have large values 
at 7 ei g. (B) At this same value 7 e i g , all eigenmodes have significant contribution. Inset 
shows a typical broadly distributed histogram at 7 eig (indicated by the marker in A, B). 
(C) The spike count JS divergence, taking the optimal value of a x for each 9. With 
optimal noise values added, asynchronous rate propagation is dramatically improved 
for high-threshold networks. 



n 



ratio: 

" P(Sj\S) 

\p(si\s'Y 

If this product is greater than 1 the MLE chooses stimulus S; less than 1 and the MLE 
chooses stimulus S'. Assuming S and S' are equally likely a priori, the error rate is 
given by 



ER(S, S') = l~E 
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n 



P(Si\S) 



\P(Sl\S') 



> 1 
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P(Sj\S) 
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where the first expectation is taken over the distribution P(-\S') and the second over 
P(-\S). This produces an (JV + 1) x (JV + 1) matrix describing the MLE error rate for 
distinguishing S from S'. We then average over either the entries in the entire matrix to 
give either the average discriminability, or we average the entries in the superdiagonal 
to give the nearby discriminability, i.e, the discriminability between nearby rates. 
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Figure 9: Rate discriminability for noise-free networks with (A, B) 6 = 1, and (C, D) 
6 = 7, as well as (E, F) a network with 9 = 7 and optimal noise, a x = 4.2. (A, C, 

E) Nearby discriminability (dashed line) and average discriminability (solid line) for 
T = 25 trials plotted as a function of connectivity level 7. (A) For low-threshold net- 
works, rate discriminability is optimal at 7 eig . (C) For high-threshold networks, nearby 
discriminability is best near 7 eig , but this minimum is shifted for average discriminabil- 
ity. (E) Adding noise improves rate discrimination in high-threshold networks. (B, D, 

F) Maximum likelihood error rate plotted for every possible pair of input stimuli before 
averaging. The chosen networks are those that minimize average discriminability, as 
indicated by the markers in (A, C, E). 

Figure (9j^.C first summarizes discriminability in the absence of noise. Rate discrim- 
inability reaches its minimal value at 7 eig « 7 obs when 6 = 1; when 6 = 7, the minimal 
discriminability does not exactly coincide with either 7 eig or 7 obs . A glance at the MLE 
error rates sans averaging reveals the particular type of computation performed in each 
case: Figure [9j3D shows the error rates at the values of 7 that yield the lowest average 
discriminability, as indicated by the markers in Figure [9]\C. Low-threshold networks 
are able to accurately discriminate between rates over the entire stimulus space, includ- 
ing nearby rates. High-threshold networks, on the other hand, although able to perfectly 
distinguish a few rates in a limited intermediate range, cannot at all distinguish between 
nearby high rates or low rates. Rather, these networks are better suited to classifying 
input rates into two bins: low and high. 

Interestingly, the added background noise promotes better discriminability between 
rates in high-threshold networks, dropping the minimal level to values even below that 
of noise-free, low-threshold networks (Figure [9^). Moreover, the MLE error rates (Fig- 
ure [9^) show a marked improvement in the ability to distinguish between nearby rates 
at 7 e i g , as revealed by the tightly banded matrix structure. Not only, then, does noise 
improve rate propagation in neurons - it also changes the computation from a coarse- 
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grained classifier to one with more resolution. This is a specific example of the more 
general phenomenon of stochastic resonance (see, e.g. McDonnell & Abbott (2009); 
Longtin|([T993]». 



9 Discussion 

Summary 

In this paper, we study the transitions in feedforward network dynamics that occur 
as connectivity strength and firing threshold are varied. We characterize these transi- 
tions via critical branching, neutral stability, higher-order correlations, and broad firing 
distributions. After quantifying critical branching by computing the branching ratio, 
we show that neutral stability (persistence of firing patterns from one network layer 
to the next), together with statistical properties of the persistent patterns, can be pre- 
dicted via a spectral analysis of the underlying mean-field transition matrix. Through- 
out most of the parameter space, persistent activity is restricted to highly bimodal, syn- 
chronous responses, as found by Reyes (2003|), [Nowotny & Huerta| ( f2003| ), and |Litvak 



et al. ( |2003 ). However, there are "transition" connectivity levels that yield persistent, 
broadly-distributed spike count histograms with higher-order correlations and large re- 
sponse entropy. For low threshold networks this occurs simultaneously with (approxi- 
mately) critical branching, revealing that such networks are well-suited to transmitting 
rates without synchronization. On the other hand, high-threshold networks do not pro- 
duce both critical branching and broad response distributions at the same connectivity 
strength; when the former is satisfied, these networks tend to produce synchronous re- 
sponses. 

Interestingly, adding further biologically-motivated features increased the robust- 
ness of transitions in high-threshold networks. In particular, simulations and spectral 
analysis show that including an inhibitory cell population extended the connectivity 
range that yields asynchronous propagation of inputs. Adding zero-mean noise to each 
neuron had a similar effect and also improved the discriminability of inputs, echoing 



the findings of van Rossum et al. ( 2002| ) in integrate-and-fire networks. 



We conclude that networks with low firing thresholds, or those in which intrinsic 
noise elevates firing probabilities, exhibit a set of dynamical and statistical signatures 
associated with "critical" transitions in network activity. 

Connections with the criticality literature 

We now discuss links with the broader literature on criticality, which suggests that the 
brain may operate at a state characterized by complex dynamics, significant higher- 
order correlations, and enhanced computational properties. This is often described as 
operating on the boundary between ordered and irregular (or chaotic) activity. In partic- 
ular, such systems can flexibly perform a wide range of operations on time-dependent 
inputs when their recurrent networks lie near the "critical" state, which is defined by 
calculating the expected neutral separation of trajectories using a mean-field model 



(Bertschinger & Natschlager, 2004, Legenstein & Maass 2007) 
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Along these lines |Beggs & Plenz| ( |2003[ ) motivates a feedforward model based on 
array recordings ^1 Here, the authors compute the mutual information between the 2 N 
possible binary "words" at the first and last layers. Intriguingly, they numerically show 
- for the low threshold case 9 = 1- that the mutual information is maximized for 
the same parameters at which critical branching occurs. Our finding in the averaged, 
mean-field setting echoes this result. An interesting extension of our work would be to 



explain the findings of Beggs & Plenz (2003) via the spectral properties of the allied 
layer-to-layer transition matrix between binary words. 

A number of experimental and theoretical studies focus on neuronal avalanches as 
a signature of critical neural connectivity. These are cascades of neural activity whose 
sizes obey a power law distribution (Beggs ^fe Plenz[ |2003^ |Kitzbichler et al.[ |2009 



Hahn et al.[ [20101 |Petermann et al.[ |2009t |Hennig et"aL| [2009t [Mora & Bialek[ [2011) 
Avalanches have been shown to arise in some neutrally stable models of neural networks 
(Haldeman & Beggs, 2005), and thus have been described as a signature of optimal 
computation. An interesting target for future work would be to extend our mean-field 
analysis to predict the occurrence of avalanches over multiple network layers, and to 
and study their role in encoding stimuli. 



Verifying and extending the model 

We imposed a number of simplifications in this paper to achieve analytical tractability. 
The most prominent of these is that our neurons are modeled as simple thresholding 
units with no intrinsic properties or time dependence ([Nowotny~& Huerta, 2003). How- 



ever, our results agree those in networks of more realistic neurons ( |van Rossum et al. 



2002^ |Reyesl |2003 [ |Rosenbaum et al.[ |20 1 0[ ) . We therefore believe that our findings will 
prove to be quite general. 

Another possible limitation is that the numerical studies presented above utilize a 
fixed value of iV = 20 neurons. However, our analytical results on spectral properties 
of the transition operator are independent of this choice. Moreover, we verified that our 
main qualitative results are preserved, e.g., for the larger value N = 100 (taking 9 = 
1, 5, 10, 20, 35); data not shown. In more detail, as with the smaller network, the system 
at A = 100 remains well-described by a mean-field transition matrix (in fact, due to 
the larger population size, it is even better fit). The eigenstructure of these matrices 
reveals an intermediate 7 e i g at which the second dominant eigenmode is both persistent 
and broadly-distributed, and there is significant contribution from all eigenvalues as 
well as maximal response entropy. For 9 = 1, this value overlaps with 7 obs , but as 
threshold increases, the gap between the two widens; accordingly, the spike count JS 
divergence increases. As for the N = 20 case, while inhibition does continue to increase 
this range, the optimal performance is not improved. The addition of noise in large 
networks, however, has similar beneficial effects: an optimal amount of noise lowers the 
minimum JS divergence to around 0.32 for high values of 9. This amount of background 
noise required generates less than 10% probability of spontaneous firing, similar to 



^The authors argue that a feedforward model is appropriate in this context as elec- 
trode sites are rarely active more than once during the cascades of neural activity that 
they study. 
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that obtained at N = 20. However, one difference at N = 100 is that the optimal 
performance under the JS divergence metric D is lower: when 6 = 1, the optimal 
network attains at best a score of 0.58, compared to 0.33 for iV = 20. Moreover, in 
the larger network the "well" in D values near the optimal 7 value is even narrower, 
requiring a finer tuning of 7. These findings suggest that, while our findings remain 
qualitatively similar for larger networks, there may be interesting new phenomena in 
the continuum limit of large N - an interesting subject of future study. 

On another note, we focused on only a few of the many metrics of signal propagation 
and coding that could be applied to the networks at hand. We note further results on one 
of these in the appendix, that used by |van Rossum et aL ( 2002[ ) to measure propagation 
of firing rates via trial-to-trial variance of responses in deep layers. This showed similar 
results to our measure D of JS divergence between input and output distributions over 
intermediate firing rates; the two measures showed distinctions at extreme firing rates, 
assessing the quiescent or saturating patterns that occur there differently (see appendix). 

Finally, we have concentrated solely on networks with a feedforward connectivity 
structure. However, these networks are equivalent to a synchronously-updated discrete- 
time network with random recurrent connections (including connections to themselves) 
under the annealing approximation (Bertschinger & Natschlager, 2004[ ). Thus, to the 
extent that these assumptions hold, the results of this paper may also be applied to the 
evaluation of persistent activity in recurrent networks. 

We close by noting experimental predictions of our work, as could be tested di- 
rectly in in-vitro feedforward networks (using the techniques of |Reyes| ( |2003[ )), or, with 
the considerations above, could predict dynamics in recurrent systems as well. First, 
asynchronous rate propagation should become possible when the membrane potentials 
of neurons are biased upwards (equivalent to decreasing the spike-generation thresh- 
old). Second, this should also occur when sufficient noise is added local to each cell 
(some white noise has already been shown to reduce synchrony in [Reyes (2003)). Fi- 
nally, adding an inhibitory population at each layer should increase the robustness of 
asynchronous propagation to network connectivity and synaptic strength. 
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Appendix 

Derivation of mean-field Markov chain 

We outline how to obtain a formula for the mean-field Markov chain given the transition 
matrix for the original Markov chain in the space of 2 N firing patterns. The first step in 
this derivation is to determine the probability that, given true connectivity matrix E L , 
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the instantiated "effective" connectivity matrix is E L : 

P{E L \E L )=p K ^\l-p) K ^-^\ 

where K(M) is the number of nonzero elements in matrix M. Each element of the 
transition matrix in pattern space is then given by 

P(x L = x|x L _! = x, E L ) = P{E L \E L ) ■ I [ar = &(E L x - 6) . 

where I denotes the indicator function. We will in two steps reduce the dimension of 
the system to condense pattern space into rate space. First, summing over x: 



P(S L = m|x L _i = x, E L ) = ^2 = x\* L -i = x, E L ) ■ I 



N 

in 

i=l 



Another sum gives the (N + 1) x (N + 1) transition matrix conditioned on the connec- 
tivity matrix E L : 

P(S L = m\S L ^i = n, Ei) = ^P(S' L = m|x L _i = x,Sl-\,E l ) ■ P{x L -i\S L -i, E L ) 

X 

= ^P(S L = m|x L _i = x,E L ) ■ P(x£_i|Sx,_i). 

Finally, averaging over every possible E L for the fixed 7, we obtain 

P(S L = m|S L _! = n) = p (S L = m\S L ^ = n, E L ) ■ P(E L ), 

which gives the elements of the mean-field transition matrix. Through these steps, the 
explicit derivation of the mean-field model from the original setup is demonstrated. 

Validity of the mean-field Markov chain model 

In this section we investigate the validity of the mean-field Markov chain model. Specif- 
ically, for a fixed network connectivity structure, we first estimate the true spike count 
distribution P 5 in response to input rate S through Monte Carlo simulation. We then 
compare this to the distribution predicted by the mean-field Markov chain P^ F = 
Pnput^ 4 by computing the Jensen-Shannon divergence between these two distributions. 
Finally, we average the JS divergence over 100 instantiations of all possible input rates 
and over 20 random networks for that particular C and p. 

Overall, the mean-field distribution approximates the true spike count distribution 



quite accurately, as shown in Figure 10 \. The white curve overlain on the figure in- 
dicates the level set 7 ~ 7 e i g . Note that agreement is perfect for fully connected net- 
works. The only major challenge to the accuracy of the mean-field approximation is 
in the deterministic limit of low C and high p. Since C is low there are few trials for 
the stochastic synapses, and the high p additionally ensures that over repetitions of the 
same stimulus S the activity follows a nearly deterministic trajectory, resulting in P 5 
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Figure 10: Investigating the validity of the mean-field model. (A) Average JS diver- 
gence between the distribution after simulation through five layers and that predicted 
by the mean-field model for varying C, p. The mean-field model breaks down in the 
deterministic limit of small C and high p. The white curve represents 7 m 7 eig 1.3. 
(B - D) Example spike count distributions from 1000 Monte Carlo simulations (grey 
bars) and their mean-field predictions (black line) for three orders of magnitude of the 
JS divergence. Parameters are (B) C = 3, p = 1, S = 3 for the worst fit, (C) C — 6, 
p = 0.5, S = 9 for the intermediate fit, and (D) C = 5, p = 0.26, S = 11 for the best 
fit. 

having a narrower distribution than the mean-field predicts. Example histograms are 
shown in Figure [TO^C to give an interpretation of values for the JS divergence. 

When repeated for 9 = 7 (data not shown), the mean-field model even better cap- 
tured the true distributions, with a maximal JS divergence of 0.15 in the region of in- 
accuracy in the limit of p ~ 1 and C ~ 7 6 S . As a final check, we also compared the 
means of the response distributions and found that, as expected, the averaged error was 
below machine epsilon (results not shown). 

Analytical results for the eigenstructure of the mean-field transition 
matrix 

Proposition 1 For any threshold 6 > 1 and connectivity < 7 < N, the transition 
matrix A possesses a unique stationary state it = v & such that it A = ir. 

Proof: Let n = (po, ■ ■ ■ ,Pn)- Then from direct matrix multiplication, the mth compo- 
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nent of the vector ttA is 

(nA) m 

The stationary state requires p r , 



N 

E 

n=0 



Pn 



,N-m 



(Ap) m for all m, i.e. 



A? 



Pr. 



= E 



n=0 



iV 



■in 



Qn) 



N-m 



for all m = 0, . . . , N. In particular, when m = 0, this becomes 



Po = Po 



n=l ^ ' 



The summed term on the right hand side must be zero. However, note that each of 
the components of this sum is nonnegative, so they each must be zero, i.e., for each 
n = 1, . . . , N either p n = or (1 — q n ) N = 0. We could have (1 — q n ) N = for 
a particular n if q n = 1. However, q n can never be 1 for sensible parameter values of 
9 > and < 7 < N. Therefore, we must have p n — for all n — 1, . . . , N, and thus 
Po = 1. The resulting stationary state is therefore unique and precisely equal to i> off . □ 

Proposition 2 Suppose the eigenvectors of A have limits as 7 — >■ N. Then, A has an 
eigenvalue A* — > 1 as 7 — > N with corresponding eigenvector v* that converges to a 
vector in the span ofv on and v oS . 

Proof: First consider 



Qn 



E 

k=0 



n—k 



as 7 — > N. For n < 9, q n = by definition. For n > 9, the sum on the right side of 
this equation approaches since n > k, so q n — > 1. Below we summarize for various 
m and n the limit of q™(l — q n ) N ~ m as 7 — > N: 



n > 9 : m = 
< m < N 
m = N 
n < 9 : m = 
< m < iV 
m = N 



q° n (l-q n ) N ^0 
C(l - ?n) iV - m -f 



Qn 



-> 1 

gn)^-™ ^ 
gn)° -)• 0. 



Now suppose A is an eigenvalue of A with corresponding eigenvector v for some 7. 
Then, A and v satisfy 



N 



n=0 



N 



rn 



a 



Qn) 



N-m 



\Vr. 
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for all m = 0, 



, N. In particular, for m = 0, we have: 



n=0 ^ ' n=0+l ^ 







which, taking 7 — )■ N, reduces to the following: 

e 

^2 Vn = Xv C 



n=0 



all other terms having vanished. Here, v is the limit of v, which exists by assumption, 
and A is the limit of A, which exists by the continuity of eigenvalues. For m = N, a 
similar expression is obtained: 



N 

v n = Xv N . 



n=6+l 

Finally, for < m < N: 

= Xv m 

This last equation reveals two possibilities: either A = or v m = for < m < N. 
The latter case implies that A = 1, thus the second largest eigenvalue of A converges to 
1 with limiting eigenvector in the span of v on and v n- All other eigenvalues converge 
toO. □ 



Proposition 3 Suppose N E — Nj > 6. Then, the (four- dimensional) transition matrix 
Aj n has a unique (two-dimensional) stationary state ix corresponding to no inhibitory 
and no excitatory neurons spiking at a layer. Moreover, the second largest eigenvalue 
converges to 1, and assuming the eigenvectors of A in have limits as 7 —> N = Ne + Nj, 
then its corresponding eigenvector converges to the space spanned by the vector corre- 
sponding to all inhibitory and excitatory neurons firing, and the vector corresponding 
to all inhibitory and excitatory neurons being quiescent. 

Proof: Because of the structure of A{ n , this proposition follows similarly to those of the 
previous two propositions. □ 



Another metric for rate propagation 

In addition to the measures described in the main text, we also considered the metric 



for rate propagation following |van Rossum et al.| ( |2002[ ). Define the rate dissimilarity 
between the input rate S/N and the rate at the Lth layer Sl/N via: 

RD( 7 , 6) = E s [E trials [(S L /N - S/N) 2 \S}} . 

There are two potential sources of poor performance according this quantification: (1) 
if the mean value of Sl is far from S, or (2) if Sl has large variance. As we see in 
Figure [TT]\, when 9 = 1 the rate dissimilarity reaches its minimal value at critical con- 
nectivity 7 e i g w 7 obs , suggesting that for low-threshold neurons, these networks are best 
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Figure 11: Rate dissimilarity for (A, B) 9 = 1, and (C, D) 9 = 7. (A, C) Rate dis- 
similarity plotted as a function of connectivity level 7. For (A) low-threshold networks, 
rate propagation is optimal at 7 obs . For (C) high-threshold networks, this is no longer 
the case. (B, D) Rate dissimilarity averaged over intermediate values (S = 6, . . . , 15, 
solid line) and extreme rates (S = 0, . . . , 5 and 16, ... , 20, dashed line). (E) Mean of 
spike count and (F) rate dissimilarity for high-threshold (triangles) and low-threshold 
(circles) networks plotted as a function of input stimulus. The networks shown in (E) 
and (F) are those that minimize the stimulus-averaged rate dissimilarity, as indicated by 
the markers in (A, C). 

able to propagate rates through the network. Outside of this intermediate connectivity 
range, the dissimilarity between input and output returns to high values. 

When threshold is raised, the dissimilarity curve changes shape and no longer has 
a sharp minimum at 7 e i g (Figure [Tip); instead, there is a robust minimum. Moreover, 
the minimal rate dissimilarity values for the low- and high-threshold networks are at 
comparable values. This may at first seem surprising, given that the high-threshold 
networks produce strong synchrony, and this should lead to large response variance. 
What is actually happening is an effect of both the increasing nullity of A and averag- 
ing over all stimuli. In Figure [TT^ the stimulus -dependent mean of the output at the 
5th layer is plotted as a function of the stimulus for the networks that minimize average 
rate dissimilarity, indicated by the markers in Figure [TT}\C, for both 9 = 1 (circles) and 
9 = 7 (triangles). It is immediately clear that the low-threshold network better propa- 
gates intermediate rates as compared to the high-threshold network. By calculating the 
stimulus-dependent rate dissimilarity, rather than taking the uniform average, we see in 
Figure [Tip the difference between these two networks. While high-threshold networks 
can propagate low and high rates better than low-threshold networks, only the latter can 
propagate intermediate rates. This is because high-threshold networks produce bimodal 
responses at the connectivity value required to propagate rates. To make this point more 
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apparent, in Figure[TTf3C we have crudely separated the rate dissimilarity averaged over 
intermediate rates (solid lines) and extreme (either high or low) rates (dashed lines). 
This reveals that low-threshold networks perform better than high-threshold networks 
for intermediate rates. 

Faced with the subtlety of these results, in the main text we use the spike count 
JS divergence in order to unambiguously reveal network properties that support the 
propagation of asynchronous input distributions. 
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